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Abstract. The inference of new information on the relatedness of species 
by phylogenetic trees based on DNA data is one of the main challenges 
of modern biology. But despite all technological advances, DNA sequenc- 
ing is still a time-consuming and costly process. Therefore, decision cri- 
teria would be desirable to decide a priori which data might contribute 
new information to the supertree which is not explicitly displayed by any 
input tree. A new concept, so-called groves, to identify taxon sets with 
the potential to construct such informative supertrees was suggested by 
Ane et al. in 2009. But the important conjecture that maximal groves can 
easily be identified in a database remained improved and was published 
on the Isaac Newton Institute's list of open phylogenetic problems. In 
this paper, we show that the conjecture does not generally hold, but also 
introduce a new concept, namely 2-overlap groves, which overcomes this 
problem. 
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1. Introduction 

One of the main challenges of biological sciences is the reconstruction of the 
'Tree of Life', i.e. the phylogenetic tree displaying all living species on earth. 
The genetic sequence data on some clusters of species are already available 
in databases like GenBank or SwissProt, and there are algorithms available 
to reconstruct the tree of each cluster. Unfortunately, many frequently used 
tree inference methods like maximum parsimony or maximum likelihood 
are known to be NP-hard (12J, ||3J, |i4J), which is why for such a huge num- 
ber of species only heuristics can be used - but these are usually not very 
reliable given such amounts of data. Therefore, the Tree of Life cannot be 
constructed all at once. Scientists rather depend on supertree methods to 
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combine known phylogenies on fewer taxa (O, ||6l, |7l). But even if there is 
no conflict amongst the input trees, not all supertrees reveal new informa- 
tion: for instance, if the input trees have no shared taxa, they can be com- 
bined in any possible way and therefore do not lead to new conclusions on 
the relatedness of the species involved. In order to avoid this problem, Ane 
et al. [1] suggested to use the concept of groves - sets of clusters with the po- 
tential to construct informative supertrees. As this potential merely depends 
on certain overlap properties of the input taxon sets, no a priori knowledge 
on underlying phylogenies is required. However, in order for groves to be 
useful for practical purposes, they should be easily identifiable in databases. 
Regarding this question, Ane et al. state the following conjecture: 

Conjecture 1.1. The following equivalent properties are true. 

1 . For any set S of taxon sets, the set of maximal groves in <S is a partiton 
of 5. 

2. If two groves intersect, their union is a grove. 

3. Two maximal groves do not intersect. 

Obiously, property (1) would reduce the search for maximal groves at 
least to a search on all possible partitions of the set of taxon sets under inves- 
tigation, even if this might still be hard. Actually, the above conjecture raised 
a lot of attention. It was published both on the Isaac Newton Institute's list of 
open phylogenetic problems rn 2007 (see http: / / www.newton.ac.uk/programmes 
as well as on the 'Penny Ante' list of the Annual New Zealand Phylogenet- 
ics Meeting in Kaikoura in 2009 (see http: / / 

www.math.canterbury.ac.nz /bio / events /kaikoura09 / penny.shtml). 

In this paper, we first show that the concept of groves can be simpli- 
fied by introducing tripartition groves, and then we prove that unfortunately 
the conjecture is not rn general true. We show this by presenting an explicit 
counterexample to property (2) of Conjecture 11.11 We also prove that the 
conjecture even fails when making the definition of groves more restrictive 
to enforce informativeness. Despite these negative results, we also charac- 
terize a new concept of groves, namely 2-overlap groves, which guarantees 
property (2), and hence the whole conjecture, to hold. 



2. Preliminaries 

The main idea of the grove concept is to decide in advance, i.e. before even 
constructing any phylogeny if a set of various taxon sets has the potential to 
deliver new information when being combined into one common supertree. 
In the context of rooted phylogenies, 'new information' of a supertree refers 
to resolving at least one triple of taxa which is not resolved by any of the 
input trees, as triples are the smallest informative imit in the rooted set- 
ting. Such triples which get resolved by a supertree but not by any of the 
input phylogenies are called resolved cross triples. In order to define groves 



Mathematical aspects of phylogenetic groves 



3 



explicitly, we therefore need some formal definitions of a phylogeny and of 
(resolved) cross triples. 

Recall that a rooted binary phylogenetic X-tree is a tree T = ( V ( 7") , E ( T) ) , 
with vertex set V{T) and edge set E(7~), on a leaf (taxon) set X = {1, . . . ,n} c 
y (7~) with only vertices of degree 1 (leaves) or 3 (internal vertices) and one 
vertex of degree 2, which is called root. In this paper, when there is no am- 
biguity we often just write 'tree' or 'phylogeny' when referring to a rooted 
binary phylogenetic X-tree. 

A topology assignment V on a set S of taxon sets is a set of trees such that a) 
for each taxon set in S there is exactly one tree in V and b) all these trees are 
compatible, i.e. can be displayed by one common tree, which is then called 
supertree. 

We are now in a position to define cross triples. In the following, we 
denote by £(5) the set of all taxa of a set of taxon sets S, i.e. £{S) = {x : 
3S e S : X E S}. 

Definition 2.1 (Cross triple). Let 5 be a set of taxon sets Xi, . . . , X^ and let 



7T = Si I . . . I S/f be a partition of S, i.e. |J S,- 



i=l 



S and Si D Sj 



' for all i 7^ /. 



Then, a cross triple of S with respect fo tt is a set of three taxa {x,y,z} c C{S) 
such that no S, contains all three of them. 

Recall that for three taxa x, y, z there are exactly three possible rooted 
binary phylogenetic tree topologies, namely {[x,y),z), {{x,z),y) and [[y,z), 
x) as shown in Figured] A topology assignment on a set S of taxon sets 
might lead to various possible supertrees, and these trees need not agree on 
how three taxa x, y, z are related. In order to formally characterize this, we 
now define what it means if a cross triple is resolved. 






Figure 1 . Three taxa x, y, z can be resolved in three differ- 
ent ways, which means three different rooted tree topolo- 
gies are possible. 



Definition 2.2 (Resolved cross triple). Let S = Xj, . . . , X„, be a set of taxon 
sets and let tt = Si| . . . |Sj; be a partition of S. Let {x,y,z} be a cross triple 
of S with respect to tt. {x,y,z} is called resolved if there is a topology assign- 
ment on S such that all possible supertrees of this assignment display the 
same of the three possible rooted trees on {x, y, z}. 
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Example. 

1. The set 5 = {{v,zv,x} , {x,y,z}} does not have any resolved cross 
triple with respect to n = {v,iv,x} \ {x, y, z}, because no matter which 
topology assignment is chosen, the two input trees only intersect in 
only one taxon, namely x. Therefore, they can be combined in all possi- 
ble ways such that no cross triple, like e.g. {v, iv,y}, gets resolved (this 
is formally shown in [T] Lemma 4.1] as cited in Lemma [3. 2b . 

2. For set <S = {{zv,x,y} , {x,y,z}} and partition ft = {w,x,y}\ {x,y,z} 
there is a topology assignment V, namely V — {Tj, as shown in 
Figure 121 such that all cross triples, like e.g. {w,y,z}, are resolved. In 
this case this is due to the fact that there is only one supertree. 




w y X X y z w y x z 



Figure 2. For S = {{iv,x,y} ,{x,y,z}}, the depicted 
topology assignment V := {Ti,T2} leads to a unique su- 
pertree. Therefore, all cross triples with respect to the only 
possible partition of S, namely ft = {iv,x,y} \ {x,y,z}, are 
resolved. 

It is known that in the rooted case, either all possible supertrees of a 
topology assignment on a set S of taxon sets agree on how a certain triple 
{x,y,z} c jC{S) should be resolved, or all three possible trees shown in 
Figure [T] are displayed in the set of supertrees \8i Prop. 9.1]. However, if the 
latter scenario happens for all possible cross triples, there is no information 
in the supertrees that is not already inherent in the topology assignment 
itself. We will now formalize this idea. 

Definition 2.3 (Informative topology assignment). Let 5 be a set of taxon 
sets and n a partition of S such that there exists a cross triple of S with 
respect to n. Then, a topology assignment P on S is called informative with 
respect to n if some cross triple of S with respect to n is resolved by V. 

Example (continued). In the above example, for the set S = {{v, w, x} , 
{x,y,z}} there is no informative topology assignment. However, the set 
S = {{w,x,y} , {x,y,z}} has an informative topology assignment with re- 
spect to partition ft = {iu,x,y} \ {x,y,z}, namely the one shown in Figure 

El 
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As stated above, the idea now is to decide without considering spe- 
cific input trees if a set of taxon sets has the potential to resolve cross triples. 
Ane et al. showed in [T] that this can be done with their grove concept, be- 
cause the potential to resolve cross triples mainly depends on certain over- 
lap properties of the imderlying taxon sets. We now define groves as in ID. 

Definition 2.4 (Grove). A grove is a set S of taxon sets such that for each 
possible partition tt of it holds either that 

• there exists no cross triple of S with respect to n, or 

• there exists an informative topology assignment on S w.r.t. n. 

Example (continued). In the previous example, the set 5 = {{v,iv,x} , 
{x,y,z}} is not a grove, because there is a partition of S, namely n = 
{v, IV, x} I {x, y,z}, which has some cross triples but no resolved one for any 
topology assignment. On the other hand, the set S = {{iv,x,y} , {x,y,z}} 
is a grove because there is only one possible partition of S, namely ft = 
{w,x,y} I {x,y,z}, with respect to which there is an informative topology 
assignment as shown in Figure El 

So groves have the property that if there is a cross triple with respect 
to a partition, at least one of these cross triples has to be resolved by some 
topology assignment. This implies that the supertree(s) on all taxon sets in 
a grove S reveal some new information which is not present in any of the 
input phylogenies. But it has to be noted that the authors of [Tj by the first 
property in Definition 12.41 explicitly allow for the situation where there is 
no cross triple for any partition of S. This occurs, for instance, if S contains 
one taxon set in which all taxa of S are present. In this case, the supertree 
is always identical to the tree induced by this taxon set and therefore never 
reveals new information. The authors state that they include this case for 
biological reasons, which are not further explained. However, this means 
that while groves may have the potential to reveal new information, they 
are not guaranteed to do so. In Section l3^ we define informative and strictly 
informative groves in order to overcome this problem. 

Next, rn order to imderstand Conjecture 11.11 we need to introduce a 
formal concept of maximality. 

Definition 2.5 (Maximal grove). A grove 5 in a database V of taxon sets is 
called maximal with respect to T>, or maximal for short, if there is no taxon set 
X in I? such that 5 U X is also a grove. 

3. Results 

3.1. Characterizing groves 

Our main goal in this section is to simplify the concept of groves by intro- 
ducing tripartition groves. Moreover, we state some general properties of 
splits, partitions and groves. Our first two lemmas formalize the concept of 
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cross triples and resolved cross triples for so-called splits. Recall that a split 
(7 = Si I S2 of a set of taxon sets 5 is a bipartition of S, i.e. a partition of S 
into two disjoint parts. 

Lemma 3.1. Let Shea set of taxon sets, let n = Si \ . . .\S^be a partition ofS and 
let £7- = Si I S2 fee (T split ofS. 

1. Assume there is a cross triple of S with respect to n. Then, there are i,j G 
{1, . . .,k}, i 7^ /, such that there are two taxa s,-, Sj fulfilling the following 
properties: 

• Si e £(S/),s/ i C{Sj),and 

• SjeC{Sj),Sji£{S^). 

2. Assume \C{S) \ > 3 and assume there is no cross triple of S with respect to 
a. Then, either £(Si) C £(52) or £(82) C C{S\), i.e. either all taxa of Si 
also lie in §2 or vice versa. 

Proof. 

1. Let {x,y,z} be a cross triple of S with respect to n. By definition of 
cross triples, x, y, z are not all contained together in any Sm, m = 1, . . . ,k. 
Without loss of generality, choose Sx G {Si,...,Sj;} such that x G 
C{Sx) and y i C{Sx). However, as y G 5, there is an Sy G {Si, . . ., Sj;} 
such that y G C{Sy). If possible, choose Sy such that x i C{Sy). Then, 
let Sj := X, Sj := y and S/ := Sx, Sj := Sy. Else, if all sets which contain 
y also contain x, choose a set Sz such that z G £{Sz). Now if y was con- 
tained in Sz, Sz would also contain x, which contradicts the cross triple 
assumption. For the same reason, 2 ^ £(Sy). So in this case, y G £(Sy), 
y i C{Sz), z G C{Sz), z i C{Sy). Then, let s, := y, Sj := z and S,- := Sy, 
Sj := Sz. This completes the proof. 

2. Assume there is no cross triple of S with respect to cr. If not all taxa 
of Si are contained in S2, this implies that there is a taxon si such that 
Si G Si and Si ^ S2. If additionally not all taxa of S2 are contained in 
Si, this implies that there is a taxon S2 such that S2 G S2 and S2 i Si. 
Now as |£(5)| > 3, there is a z G 5 such that z 7^ si,S2. Then, by 
definition the set {si,S2,z} is a cross triple of <S with respect to a. This 
contradicts the assumption and thus completes the proof. □ 

Next we recall a result by Ane et al. in order to prove the following 
lemma. 

Lemma 3.2 (Lemma 4.1 of H)). Let X and X' be two taxon sets that share at 
most one taxon. Then, any trees on X and X' , respectively, are compatible and any 
cross triple of S := X U X' with respect to n := X|X' remains unresolved. 

Lemma 3.3. Let S bea set of taxon sets and let a = Si\S2be a split of S. If there 
is a resolved cross triple of S with respect to a, then 

1. (a) there is a taxon si G Si such that si i S2 and 
(b) there is a taxon S2 G S2 such that S2 i S\ and 



Mathematical aspects of phylogenetic groves 



7 



(c) there are two taxa e ^{S), x ^ y, such that x,y are both in 

C{s,)nC{S2). 

2. If properties 1. (a)-(c) hold and additionally there are taxon sets Xj e Si and 
X2 E S2 such that si,x,y E Xi and S2,x,y E X2, then there is a resolved 
cross triple ofS with respect to a. 

Proof. 1. By Lemma l3.1l all cross triples imply properties (1) and (2). Now 
we show that a resolved cross triple additionally implies (3). Assume 
there are no taxa x,y with the properties specified in (3). This means 
that Si and S2 share at most one taxon. Now we can choose any trees 
Ti for Si and T2 for S2. As they overlap rn at most one taxon, they 
are compatible by Lemma 13.21 and can be combined into a common 
supertree by pruning any edge of Ti to any edge of T2- This way, no 
cross triple can be resolved. So if a cross triple is resolved, (3) must 
hold. 

2. Now we show that (a), (b) and (c) together with the properties Si, x,y E 
Xi C Si and S2, x,y E X2 Q S2 imply that there is a resolved cross triple 
of S with respect to u. We prove this by construction of an informative 
topology assignment. We choose Ti for Xi and T2 for X2 such that they 
are so-called caterpillar trees as depicted by Figure |3l 



Ti T2 




Figure 3. Whenever the so-called caterpillar trees Ti and 
T2 are compatible and have the depicted taxon labellings, 
all supertrees combining these two trees will display the 
subtree (((s2, x),y), si) as shown on the right. 

In particular, we choose Ti such that the taxa x and y specified 
by property (3) are together on the only 2-clade (so-called 'cherry') and 
taxon Si (specified by properties (a) and (c)) is the taxon sharing the 3- 
clade with them. Moreover, we choose T2 such that S2 and x (specified 
by properties (b) and (c)) are together on the 2-clade and taxon y is the 
taxon sharing the 3-clade with them. The whole setting of Ti and T2 is 
shown rn Figured All other properties of T\ and T2 as well as all other 
trees for the other sets in Si and S2 (if there are any) are chosen arbi- 
trarily but in a way that leaves them compatible (i.e. if two sets share 
some taxa, we do not add conflicting information to the corresponding 
trees). By properties (a) and (b), the triple {si,S2, x}, for instance, is a 
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cross triple with respect to a. Now every supertree comprising Ti and 
T2 (along with other trees if applicable) takes the fact into account that 
by Tj, X and y should be closer together than to any other taxon which 
is also included in Ti. Also, by T2, x and S2 are closer to one another 
than any one of them is to y, but still closer to y than to any other taxon 
of T2. This implies the following subtree for each possible supertree: 
(((s2,x),y),si). Therefore, the cross triple {si,S2,x} is resolved as the 
only possible way to display this triple is ((s2, x),Si) as shown in Fig- 
ure |3] Thus, we have constructed an informative topology assignment 
of S with respect to a. This completes the proof. 

□ 

Next we introduce the definition of split groves and tripartition groves in 
order to simplify the definition of groves. 

Definition 3.4 (Split grove). A split grove is a set S of taxon sets such that 
for each possible split u of 5 it holds either that 

• there exists no cross triple of S with respect to a, or 

• there exists an informative topology assignment on S w.r.t. cr. 

Remark 3.5. Obviously, every grove is also a split grove by definition. How- 
ever, the opposite is not always true. This can be seen by looking at the set 
S := {{x,y}, {y,z}, {x,z}}. None of the possible splits CTj := {x,y}\{y,z}, 
{x,z}, 172 := {x,y},{y,z}\{x,z} or := {y, 2}|{x,y}, {x, z} has any cross 
triples, as all taxa are displayed together on one side of the split, respec- 
tively. So by Definition I3.4i 5 is a split grove. However, the partition n := 
{x,y}|{y,z}|{x,z} has a cross triple, namely {x,y, z}. This cross triple is not 
resolved by any tree combining the three 2-taxon sets ('cherries') as 2-taxon 
trees do not provide any information on the tree topology. Thus, S is not a 
grove as there is a partition which has a cross triple, but no resolved one. 

As the above remark shows, the concept of split groves is not strong 
enough to cover the grove concept and to simplify it. Therefore, we next 
introduce tripartition groves and then show that they indeed are equivalent 
to groves. Recall that a tripartition is a partition of a set into three disjoint 
subsets. 

Definition 3.6 (Tripartition grove). A tripartition grove is a set S of taxon 
sets such that for each possible split or tripartition t of 5 it holds either that 

• there exists no cross triple of S with respect to t, or 

• there exists an informative topology assignment on S w.r.t. t. 

Again, all groves are tripartition groves by definition. Next we show 
that the converse is also true, which provides a significant simplification of 
the grove concept. 

Theorem 3.7. Every tripartition grove is a grove and every grove is a tripartition 
grove. 
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Proof. As mentioned before, the second part directly follows by definition. 
We now prove the first part. Therefore, let 5 be a tripartition grove and let 
TT = Si I . . . |Sj; be a partition of S. If there is no cross triple of S with respect 
to TZ, there is nothing to show. So now assume that there is a cross triple 
{x, y,z} ofS with respect to tt. We have to show that there is also a resolved 
cross triple. Let 5xy := U Si, Sxz ■= U S/, Syz := U S/, 

!e{l,..,(:} /£{!,. .,*} ie{l,..,/c} 

xGSi,z^Si z&Si,ytSi yeSj,x<iSi 

S := U Si- Now let T := Sxy\Sxz\<Syz U S he a tripartition seperating 

k} 

x,y,z^Sj 

{x,y,z} such that it is a cross triple with respect to t by construction. This is 
possible as {x, y, z} is a cross triple of S with respect to tt and thus the three 
taxa do not appear together in any S/. Note that it is possible, if tt is a 2- 
partition, i.e. a split, that one of the sets Sxy, Sxz, Syz as well as the set S may 
be empty. However, as <S is a tripartition grove, the existence of a cross triple 
with respect to t implies the existence of a resolved cross triple {£, y, z} of 
S with respect to x. This cross triple is also a cross triple with respect to n 
(otherwise {x,y,z} would appear together in one S, and would therefore 
not be separated by t, either). So using the same topology assignment that 
resolves {x,y,z} for t, {x,y,z} is a resolved cross triple of S with respect to 
TZ. This completes the proof. □ 

Theorem 13.71 shows that the concepts of groves and tripartition groves 
are identical. This characterization of groves thus simplifies the search for 
groves already drastically, as it reduces the analysis from all possible parti- 
tions to splits and tripartitions. 

In the following section, we finally investigate Conjecture ll.il 



3.2. Examining unions of intersecting groves 

Our main goal in this section is to show that Conjecture 11.11 does not gen- 
erally hold. Therefore, we first show that indeed the three properties of the 
conjecture are equivalent, which enables us to provide a counterexample to 
property (2) and thereby disprove property (1). 

Lemma 3.8. The three properties stated by Coniecture \l.l\ are equivalent. 

Proof. 

• We start by showing that (1) implies (2). Let Si, S2 C P be groves in a 
database of taxon sets V such that 5i fl ^2 7^ 0. Assume Si U ^2 is not 
a grove. Let 5^^'"', 51^"" be maximal supergroves of Si, S2, respectively. 
I.e. 5"'"', iS^""* are groves that contain Si or 52, respectively, and by 
Definition 12.51 they are such that no other taxon set of V can be added 
to them without destroying the grove property. Note that it is possible 
that Si = <SJ"'"' for i = 1,2. As by assumption Si U 52 is not a grove, 
we have 5^" 7^ S1US2 for i = 1, 2. Now as 5x0 52 7^ 0, we have 
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^max Pi ^max _^ g^^- because of property (1) of Conjecture ll.il the set 
of maximal groves of S are a partition of S, which in particular means 
that they cannot intersect. So this is a contradiction and therefore the 
assumption is wrong. Thus, Si U 52 is a grove. 

• Next we show that (2) imphes (3). Let iS""", 5^'"' be two maximal groves 
such that 5f 7^ S^'^\ Assume 5f n 5^"" ^ 0. Then by property (2), 
^max y ^max jg ^ grove. But as S^""" ^ 5^^", there is at least one taxon 
set included in 5™'"' U .S^"" which is not present in iS"'"' or ^J""", re- 
spectively Therefore, I^Sf'" U ^^'^l > and |5p U ^^"''l > l^^'"]. 
This contradicts the maximality of tS™"", 52'"''. Thus, the assumption is 
wrong and 5f n 5^"" = 0. 

• Last, we show that (3) implies (1). As (3) already states that maximal 
groves do not intersect, it only remains to show that each taxon set 
X G I? belongs to a maximal grove. All X G are groves by definition 
(as for single sets there is no partition). Now each such grove can either 
be combined with other taxon sets in T) in order to form a bigger grove 
(which then can be maximized adding more taxon sets if possible), or 
no such combination is possible. In the latter case, X itself is maximal 
by definition. So in both cases, X belongs to a maximal grove. □ 

We are now in a position to show that Conjecture 1 1 . 1 1 does not hold. 

Proposition 3.9. Coniecture \l.l\ is not generally true. In particular, the union of 
intersecting groves is not necessarily a grove. 

Proof. We provide an explicit counterexample to property (2) of Conjecture 
iniLet^i := {{1,2,3}, {1}} and ^2 := {{1,4,5}, {1}}. Then, 5i is a grove 
as there is only one possible partition, namely n := {1,2,3} | {1}, and for 
this partition there are no cross triples. By the same argument, S2 is also a 
grove. Now 5i n ^2 = {{1}} 7^ 0. If Conjecture 11.11 was true, 5i U ^2 = 
{{1.2,3}, {1}, 

{1,4,5}} would be a grove. But by Lemma [3.21 this cannot be the case, as 
the sets {1,2,3} and {1,4,5} only share one taxon and thus all cross triples 
between the two sets remain unresolved. Moreover, the set {1} does not 
contribute any new information to either of the trees. Therefore, Si U S2 is 
not a grove. □ 

In our proof, we explicitly use Lemma 4.1 of llj in order to show that 
the example provided is a counterexample to property (2) of Conjecture ll.il 
but of course one can also examine all possible topology assignments for the 
sets {1,2,3} and {1,4,5} (the singleton {1} does not contribute any infor- 
mation). The only possible cross triples of Si U S2 (if there are any, which is 
only the case for partitions which split the sets {1,2,3} and {1,4,5} apart 
from each other) are {1,2,4}, {1,2,5}, {1,3,5}, {2,3,4}, {2,3,5}, {2,4,5} 
and {3,4,5}. Now the intuitive reason why none of these cross triples is 
resolved when combining the topologies assigned to the sets {1,2,3} and 
{1,4,5} is that their only overlap is taxon 1, which does not suffice to fix 
anything in the possible supertrees. 
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3.3. Some corrections coricerniiig the paper by Ane et al. HI 

It should be noted that 5i := {{1,2,3}, {1}} and ^2 := {{1,4,5}, {1}} as 
presented in the proof of Proposition 13.91 are indeed both groves, despite 
the fact that each of them consists only of two taxon sets which overlap with 
one another in just one taxon. This is in contrast to a statement in the context 
preceding Proposition 5.1 of fT], where the authors claim that the smallest 
example of a grove with one-taxon overlaps consists of four taxon sets. In 
fact, the authors make the same mistake in both Propositions 5.1 and 5.2. 
Therefore, we will now formally state these Propositions and correct them 
thereafter. 

Proposition 5.1 of (T) Let be a set of three taxon sets such that no 
two elements of S share more than one taxon. 

1. Then, no cross triple of S with respect to any partition is resolved, and 

2. therefore, S is not a grove. 

The second statement in Proposition 5.1 of UJ is wrong. In fact, the 
authors correctly prove the first statement but then conclude the second 
statement out of the first without taking into account that their definition 
of groves does not require the existence of any cross triples. The first part 
of the proposition, however, is correct and interesting; it is another formal 
statement proving that Si U ^2 as in the proof of Proposition l3.9l cannot be a 
grove. This is due to the fact that this set can be partitioned such that it has 
cross triples, but as it consists of three taxon sets where no element shares 
more than one taxon with another element, no such cross triple can be re- 
solved. 

Nonetheless, there are groves consisting of three taxon sets where no 
two elements share more than one taxon. For instance, S := { {1, 2, 3}, {1, 2}, 
{2,3}} is a grove contradicting statement (2) of Proposition 5.1 of (T|. This 
is due to the fact that one set in S contains all taxa, so no partition contains 
any cross triples. 

As the authors use Proposition 5.1 of [1 J to prove the following propo- 
sition, it is not surprising that the same mistake occurs there, too. 

Proposition 5.2 of plLet 5 be a set of four taxon sets such that no two 
elements of S share more than one taxon. Then, 5 is a grove if and only if 

1 . each taxon set shares a taxon with each other taxon set, and 

2. each of the six overlaps involves a different taxon. 

The statement of Proposition 5.2 of (ll can be proven wrong by consid- 
ering the set S := {{1,2,3}, {1,2}, {1,3}, {1}}. Again, as one taxon set in S 
contains all taxa, no partition produces any cross triples, which makes S a 
grove. But all six overlaps involve the same taxon, namely taxon 1. However, 
the other direction of Proposition 5.2 of [T) is true, as well as both directions 
are true for groves which do indeed have at least one cross triple with re- 
spect to a partition. 
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The mistake made in the above propositions is explicable considering 
the main purpose of the grove idea: only sets of taxon sets with the poten- 
tial to deliver new information should be considered. As explained before, 
this idea is not really reflected in Definition 12.41 the definition of groves as 
given by the authors. However, it will be shown in Section 13.41 that sim- 
ply excluding trivial cases (like e.g. groves containing a set of all taxa) from 
the definition does unfortunately not improve the situation with regards to 
Conjecture ll.il 

3.4. Informative groves and strictly informative groves 

Now we modify the concept of groves in order to guarantee more rnforma- 
tiveness. We have seen for instance that the example in the proof of Propo- 
sition |3]9] leading to the problematic case only employs groves Si and ^2 
which both contain one set which includes all their taxa, respectively. So 
any supertree of the sets in Si will always be identical to the tree assigned 
to this taxon set (and the same holds for ^2, respectively). Therefore, whilst 
both Si and ^2 fulfill the requirements of groves, they do not represent very 
interesting cases as no supertree of them can ever reveal new information. 
We now examine Conjecture ll.ll for more interesting cases, namely informa- 
tive and strictly informative groves. 

Definition 3.10. 

1. A grove S is called informative, if there exists a partition tt of 5 such 
that there is an informative topology assignment on S w.r.t. n. 

1. A grove S is called strictly informative, if for all partitions tt of 5 there 
is an informative topology assignment on S w.r.t. n. 

We now state a result analogous to Proposition 13.91 for informative 
groves. 

Proposition 3.11. The union of intersecting informative groves is not necessarily 
a grove. 

Proof. We provide an explicit counterexample. Let tSi := {{1,2,3}, {2,3,4}, 
{4}} and ^2 := {{4, 5, 6} , {5, 6, 7} , {4}}. Then, .Si is an informative grove 
as the only way to partition Si such that there are cross triples is to split 
{1,2,3} and {2,3,4} apart. Then, the cross triples are {1,2,4} and {1,3,4}, 
which are for instance resolved when the topology assignment ((1,2), 3) 
and ( (2, 3), 4) is chosen, as then the only possible supertree is ( ( (1, 2), 3), 4). 
This is analogous to the example depicted by Figure El By the same argu- 
ment, ^2 is an informative grove. Now Slf^S2 = {{4}} 7^ 0. But 5i U 52 = 
{{1,2,3}, {2,3,4}, {4}, {4,5,6}, {5,6,7}} is not a grove. As above, the in- 
tuitive reason is that for partiton tt := {1, 2, 3} , {2, 3, 4} , {4} | {4, 5, 6} , 
{5, 6, 7}, no matter which topology assignment is chosen, the fact that iSi 
and S2 overlap in just one taxon, namely taxon 4, is not enough to fix the 
supertrees. Basically, one can choose compatible trees for all taxon sets such 
that supertrees for Si and 52 are computed, respectively. Then, by Lemma 
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4.1 of [1], as the taxa of Si and ^2 only intersect in one taxon, the two trees 
are compatible and can be combined in any possible way to form a com- 
mon supertree, as the only condition is that both contain taxon 4. Therefore, 
no cross triple of this partition is resolved (but there are cross triples with 
respect to tt, e.g. {1, 2, 5}). Thus, Si U ^2 is not a grove. □ 

It is worth noting that the union of informative groves does not only 
lose informativeness when the underlying groves intersect, but rather the 
grove property itself. We now show a weaker statement for strictly informa- 
tive groves. 

Proposition 3.12. The union of intersecting strictly informative groves is not 
necessarily strictly informative (even if it is a grove). 

Proof. We provide an explicit counterexample. Let Si := { { 1, 2, 3 } , {2, 3, 4 } } 
and S2 '■= {{1,2,3}, {1,3,4}}. Then, Si is a strictly informative grove as 
the only way to partition Si is to split {1,2,3} and {2,3,4} apart. The cross 
triples are {1,2,4} and {1,3,4}, which are for instance resolved when the 
topology assignment ( (1, 2), 3) and ( (2, 3), 4) is chosen, as then the only pos- 
sible supertree is (((1,2), 3), 4). This is analogous to the situation depicted 
in Figure |3l By the same argument, ^2 is a strictly informative grove. Now 
SinS2 = {{1,2,3}} 7^ 0. But5iU<S2 = {{1,2,3}, {2,3,4}, {1,3,4}} is not 
strictly informative. This can be seen when examining the split {1, 2, 3} | 
{2,3,4}, {1,3,4}, which has no cross triples and therefore does not fulfill 
the strict informativeness criterion. □ 

While Proposition 13 . 1 21 shows that even strictly informative groves do 
not fulfill the properties of Conjecture 11.11 (2), we conjecture that a scenario 
as the one in Proposition l3.11l is not possible for strictly informative groves. 

Conjecture 3.13. The union of two intersecting strictly informative groves is 
a grove. 



3.5. 2-overlap groves 

As the previous sections show, the grove concept introduced by Ane et al. 
fn does not fulfill the properties specified in Conjecture 11.11 even if it is 
interpreted in a stricter way as in Section 13.41 As explained in Section [TJ 
Conjecture 11.11 (1) is of importantance concerning the search for groves in 
databases. We now introduce an alternative grove definition, which fulfills 
the conjecture and thus simpflifies the search for groves. In order to do so, 
we first need to define 2-overlap graphs. 

Definition 3.14. Let 5 be a set of taxon sets Xj, . . . , Xj.. Its 2-overlap graph is 
the graph G := {S, E), where an edge {X,-, Xj} e E if and only if X/ and Xj 
share at least 2 taxa, i.e. |X, H Xj| > 2. 

Definition 3.15 (2-overlap grove). Let 5 be a set of taxon sets. Then, S is 
called a 2-overlap grove if and only if its 2-overlap graph is connected. 
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Example. Figure |4] shows an example S where the 2-overlap graph is con- 
nected which makes S a 2-overlap grove, as well as an example S, whose 
2-overlap graph consists of two connected components such that S is not a 
2-overlap grove. 

We now show that 2-overlap groves are indeed groves. 

Theorem 3.16 (Theorem 1.1 of U). Every set of taxon sets whose 2-overlap 
graph is connected is a grove. 

Corollary 3.17. Every 2-overlap grove is a grove. 

The proof of Corollary 13. 17l is a direct conclusion of Theorem l3.16l So all 
2-overlap groves are groves, but - as can be seen for instance in Proposition 
13.91 - not all groves are 2-overlap groves. There, we used the groves Si := 
{{1,2,3} , {1}} and ^2 := {{1,4,5} , {1}}, whose 2-overlap graphs are both 
not connected, to show that the union of intersecting groves need not be a 
grove. Next we show that such issues naturally cannot arise with 2-overlap 
groves. 

Theorem 3.18. If two 2-overlap groves intersect, their union is also a 2-overlap 
grove. 

Proof. Let Si, S2 be two 2-overlap groves such that <Si n ^2 7^ 0.Then, there 
is a taxon set S E Si D S2. Now, as the 2-overlap graphs of Si and ^2 are 
both connected, both are in particular connected to S. Thus, S connects the 
two 2-overlap graphs and therefore the 2-overlap graph of 5 := tSi U 52 is 
connected. By Definition l3.15l this implies that 5 is a 2-overlap grove. This 
completes the proof. □ 

As Lemma 13.81 naturally also applies to 2-overlap groves, it turns out 
that 2-overlap groves have all required properties stated by Conjecture 11.11 
(not just the second one), and thus resolve all problems inherent to the grove 
concept of Definition 12.41 And as all 2-overlap groves are also groves, they 
also have the potential to construct informative supertrees. 

Remark 3.19. Note that while all 2-overlap groves are also groves, they are 
not directly related to strictly informative groves, as neither property im- 
plies the other. We show this with the following two examples. 

1. The set of taxon sets S = {{1,2}, {1,2,3}} is a 2-overlap grove, be- 
cause its 2-overlap graph is connected as shown in Figure ID However, 
as the only possible partition, namely the split {1, 2} | {1, 2, 3}, does not 
have any cross triples and thus also no resolved ones, 5 is a grove, but 
not a strictly informative one. So it has to be noted that the 2-overlap 
grove concept still includes some groves which contain one set with all 
taxa, in which case a supertree cannot provide new information. 

2. The set of taxon sets S = {{2,3,6}, {1,4,5}, {2,4,5,7}} is a not a 2- 
overlap grove, because its 2-overlap graph is not connected as shown 
in FigurelU However, choosing the topology assignment V : = 
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{((2, 3), 6), ((1,4), 5), (((4, 5), 2), 7)} leads to supertrees in which the 
position of the taxa 3 and 6 varies for each tree, but all contain the sub- 
tree T := ((((1,4), 5), 2), 7) (the so-called 'maximum agreement sub- 
tree') as shown in Figure|5l Moreover, the triple {1, 3, 7} is a cross triple 
for all possible partitions of S, and this cross triple is resolved by the 
maximum agreement subtree. Thus, this topology assignment resolves 
a cross triple for all possible partitions. Therefore, <S is a strictly infor- 
mative grove. 



{1,2} {1,2,3} 



{2,3,6} 



{1,4,5} 
{2,4,5,7} 



Figure 4. The set 5 = {{1,2}, {1,2,3}} is a 2-overlap 
grove but not a strictly informative grove. For the set S, the 
situation is vice versa. 



T 




Figure 5. The topology assignment V := 
{((2,3),6),((1,4),5),(((4,5),2),7)} leads to various 
supertrees, as the position of the taxa 3 and 6 varies. 
However, all supertrees contain the maximum agreement 
subtree T := ( ( ( (1, 4), 5), 2), 7), so that in any case the triple 
{1, 3, 7} gets resolved. 



4. Discussion 

In this paper, we showed that the concept of groves as introduced by Ane et 
al. m can be simplified to the much simpler concept of tripartition groves. 
This reduces the search for groves in databases drastically from investigat- 
ing all partitions to analyzing only splits and tripartitions only, which is a 
great improvement. But we also showed that groves as introduced by Ane 
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et al. unfortunately do not fulfill the requirements of Conjecture ll.il which 
implies that they are hard to find in databases of taxon sets even when our 
simplification is considered. We also investigated slightly modified versions 
of the original grove definition in order to enforce more informativeness, 
but even then there were similar drawbacks in the grove concept. Finally, 
we successfully proved that our new concept of 2-overlap groves fulfills all 
properties of the conjecture, while these 2-overlap groves are also groves rn 
the original sense of the definition. So they comprise the good properties of 
groves, namely for instance offering the potential of informative supertrees, 
with a property that makes them more easily identifiable in databases. It 
should be noted that the latter is not only due to the fact that Conjecture 
11.11 (2) holds for 2-overlap groves, but also to the fact that the construction of 
the 2-overlap graph of a database can be done more efficiently and more eas- 
ily than investigating all cross triples of all possible splits and tripartitions. 
Moreover, the 2-overlap graph of a database immediately displays all max- 
imal 2-overlap groves rn the database and thus shows quickly which taxon 
sets should be combined in order to find new information in the correspond- 
ing phylogenetic supertrees. It should be noted, though, that our concept of 
2-overlap groves only covers a subset of the groves by Ane et al. [l], as all 
2-overlap groves are groves but the opposite does not hold. Therefore, some 
taxon sets which only overlap in a single taxon are not covered by our con- 
cept, whereas they may be considered as groves. However, we claim that 
these examples are rather artificial in the sense that in practice, biologists 
rarely wish to combine trees with only one taxon in common. So the simpli- 
fication gained by restricting ourselves to 2-overlaps seems to outweigh the 
possible drawbacks by considering fewer cases. Altogether, we think this 
new concept is promising and more research should be done. For instance, 
practical analyses in real databases would be really helpful. 
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